Motivation

Predict the number of arrivals for 14 days after last known ship arrival date. I am using facebook's prohet model which decomposes time series (much like the Holt Winters model) to detect trends, seasonality etc to predict the arrivals for the next 2 weeks.

  • I am training the model for only one port, this can easily be extended to other ports by using spark (repartion on the port and use pandas udf to train models in parallel) or even joblib in my PC.
  • The model is fit for one of the detected ports at the coordinates 38.895642, 118.668746. You can verify this from the gooole satellite image view as well.

Libraries used

pandas, fbprohet, folium maps.

In [19]:
import pandas as pd
import numpy as np
import matplotlib.style as style
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from fbprophet import Prophet
from IPython.display import Image
import folium
style.use('ggplot') 
In [3]:
ship_counts = pd.read_csv("data/port_counts/geohash_wwu_ports.csv")
print(ship_counts.cluster_labels.value_counts())
ship_counts.head()
0    2533
1      78
Name: cluster_labels, dtype: int64
Out[3]:
cluster_labels date port_lat port_lon number_ships
0 0 2012-05-31 38.895642 118.668746 6.0
1 0 2012-06-01 38.895642 118.668746 9.0
2 0 2012-06-02 38.895642 118.668746 8.0
3 0 2012-06-03 38.895642 118.668746 6.0
4 0 2012-06-04 38.895642 118.668746 5.0

Show port location for cluster 0

In [21]:
map2 = folium.Map(location=[38.895642, 118.668746], tiles='CartoDB dark_matter', zoom_start=11)

folium.Marker([38.895642, 118.668746], popup='detected port location').add_to(map2)
map2
Out[21]:
In [4]:
from math import radians, cos, sin, asin, sqrt

R = 6371.0088


def haversine(lat1, lon1, lat2, lon2):
    dLat = radians(lat2 - lat1)
    dLon = radians(lon2 - lon1)
    lat1 = radians(lat1)
    lat2 = radians(lat2)
    a = sin(dLat/2)**2 + cos(lat1)*cos(lat2)*sin(dLon/2)**2
    c = 2*asin(sqrt(a))
    return R * c
In [8]:
print(ship_counts.loc[ship_counts.cluster_labels == 0].date.min(),
      ship_counts.loc[ship_counts.cluster_labels == 0].date.max())
ship_counts.loc[ship_counts.cluster_labels == 0].number_ships.hist()
2012-05-31 2020-01-15
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1ec13550>

Fit the time-series model fbprophet

Make predictions for 14 days into the last known date in teh dataset for the port.

In [10]:
# Python
m = Prophet()
port_arrivals = ship_counts.loc[ship_counts.cluster_labels == 0][[
    "date", "number_ships"]]
port_arrivals.rename(columns={"date": "ds", "number_ships": "y"},inplace=True)

m.fit(port_arrivals)
INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[10]:
<fbprophet.forecaster.Prophet at 0x1a2050e668>
In [11]:
future = m.make_future_dataframe(periods=14)
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Out[11]:
ds yhat yhat_lower yhat_upper
2542 2020-01-25 10.706179 0.130169 20.877839
2543 2020-01-26 10.821017 0.731879 21.195544
2544 2020-01-27 11.047220 1.075134 20.898182
2545 2020-01-28 10.323759 -0.457967 20.142680
2546 2020-01-29 10.018505 -0.250649 20.468042
In [12]:
fig1 = m.plot(forecast)
In [13]:
fig2 = m.plot_components(forecast)
In [14]:
from fbprophet.plot import plot_plotly
import plotly.offline as py
py.init_notebook_mode()

fig = plot_plotly(m, forecast)  # This returns a plotly Figure
py.iplot(fig)
In [ ]: